Greg Detre
@10 on Tuesday, October 01, 2002
Federica Busa present
assignment number 2
started 1985, $3m funding (apparently mainly from CIA etc.), small group of lexicographers
100,000+ entries
of the 75,000 synsets, there are c. 10,000 meronyms
antonymy is a lexical, rather than semantic relation, e.g. rise/descend aren�t antonyms
antonyms:
21% of adjectives, 9% of verbs, 2.4% of nouns
these are very small, given that people can enumerate loads of antonyms for more or less anything (lots that we can think of don�t exist, e.g. cat/dog, bachelor)
ones that do exist, e.g. success/failure, best/worst
you could always define non-X as the antonym of every noun J (Deb)
store adjectives more or less as antonyms
hyponymy/hypernymy
is-a relation as Wordnet killer-app
does it distinguish sub-class-of vs instance-of?
e.g. a president is a sub-class of person, while Bill Clinton is an instance-of president)
CYC does
there are so many different types of is-a
the same goes for has-a
what about multiple-inheritance (from the top level domains)???
book as artifact vs knowledge, or animal vs meat
�fish� as food is a different synset from �fish� as animal � Deb wants to know whether you do ever want to cross synsets without having to go via the level above
Tom thinks that a lot of the controversial decisions (e.g. the 25-part top-level grouping) are really just implementation decisions, e.g. to make it easier to split it up into files for separate lexicographers
however, they are making a commitment to the idea of synsets, i.e. the idea that words are swappable to some degree � what happens when you meet both units simultaneously?
is the database generated by graduate students, or from corpora/tagged texts?
Federica said that apparently they weren�t intending this for NLP tasks, and weren�t prepared for the criticism from NLP researchers of how much ambiguity there is
they haven�t added noun � verb relations yet, because it�s problematic
Adjectives
bipolar model � most adjectives are defined by/in terms of their antonyms
gradation
e.g. saintly, good, worthy, ordinary, unworthy, evil
more than one-dimensional?
e.g. colour, taste, emotions, personality
happiness-sadness vs intensity � difficult to collapse down to one dimension
we may be off by a couple of orders of magnitude in terms of the dimensionality of some adjectives (e.g. taste)
Verb classes
top level, e.g. weather verbs
lexical entailment, e.g. sleeping/snoring
troponyms (invented word) � particular ways to
Two major gripes:
too many disconnected synsets, not clustered enough
useless glosses
untagged, unrelated to each other
X/Wordnet 2 or something trying to solve this problem???
taxonomic issues
a lot of people use it, but is it good for anything?
often used in query-expansion (e.g. in information-retrieval), sense disambiguation, sense-tagging (given a �context�, find best-fitting subtree), semantic distance, topic clustering
perhaps it�ll be really good once you�ve already boot-strapped to a certain point
by that point, it might just be able to read a dictionary � but WN does have extra, more machine-readable information
combining work in AI, phil of language, generative syntax
Tom thinks it seems to be making more claims than the Wordnet people
does it rest on anything empirically???
depends on what you take to be empirical data
problems with consistency?
Generative Lexicon went through two stages:
earlier years, straight theoretical, almost Chomskyan
later, actually started to deploy the Generative Lexicon system
the paper is kind of in between those
applications of GL:
Simple
NLP
Euro Wordnet uses the GL top ontology
the idea of qualia structure
Julius Moravcsik (Aristotle�s 4) � meaning of a word:
constituency
generic domain of application
functional element in meaning
causal origin
Deb thinks there�s a fifth: a theory of how it works (is this his own idea???)
e.g.:
person: theory of mind
aeroplane: na� theory of aerodynamics
they don�t think this is reflected in linguistic data, e.g. in how noun-noun compounds work, or how you account for long-distance dependencies in the syntax (???)
adjectives sometimes can�t be used in syntactically identical expressions in composition with certain nouns
e.g.
�a good rock� is ok if uttered by a climber
#�a good cloud�
an old swimer
person who is a swimmer and is old
a person who has eben swimming for al ong time
an old fish
a fish who is old
* a fish that has been swimming for a long time
an old story
a story that was written a long time ago
* a story I have been reading for a long time
they want to base the interpretation on the adjective/verb
forces you to enumerate all of the different varieties of �good�
what kind of granularity do you need to have? how do you know to rule out the ones that are funny (e.g. a good cloud � what about pilots, picnics, cloud-centric cultures???)?
surely you can�t enumerate all the different permissible contexts etc.???
I think that �good� is definitely anchored to function � is this controversial???
no, this is what they go on to argue � �good� is related to �what I do with it�
in terms of �old�, it�s more related to �how the entity came about�
Can
you study the structure of concepts in the same way as you study syntax?
Are there generalisations that can be made about
what drives lexical inference?
Can you separate analytical knowledge from
contextually determined interpretations?
Can you set up an empirical basis for motivating
lexical representations?
Can you capture abstractions without enumeration?
they look at data in a linguistic empirical way (I think???)
Tom thinks that you could build an intelligent system without differentiating between nouns and verbs (for example)
he thinks they�re making assumptions about meaning/language
assumption that we can get at stuff/understanding (of what???) by studying language in general
concepts have different degrees of complexity/richness
and so different numbers of inferences
this applies across (PoS) categories
whereas Wordnet uses different strategies for different categories
but I would argue that this is because they assume that we (brains) use different strategies/representations for different categories
the same concept can be lexicalised in one or more ways (i.e./e.g. as a noun or verb or both)
children�s data
easier mapping between parts of speech and primitives � the linguistic data is one lens onto the concepts � perhaps you can see why certain primitive parts of speech emerge early
Tom argues that looking at the language alone won�t tell you enough � it�s a surface representation of something deeper � not make structure of language the end-goal if you�re interested in the conceptual structure
language is not the only lens on conceptual structure
how much of Wordnet could you discover just from the linguistic data?
language impairment
patient being completely incapable of talking about containers
e.g. �the frog is in the ��
just how generative is the GL?
generate your entire ontology from combinations of qualia
Pustejovsky
� don�t enumerate word senses because there�s too many ways in which two words
can combine
lexical database vs knowledge base???
encoded database term-property relations vs semantic relations (smartness)
put presentations online???
IR??? information-retrieval
why is it called �Generative Lexicon�??? what does it mean???
when they say they�re thinking about its uses for AI, where exactly would it fit in/what would it do???
what is the difference between lexical and semantics??? what do you mean by the �lexicon�???
media lab vs AI lab???